musical form
Integrating Text-to-Music Models with Language Models: Composing Long Structured Music Pieces
The SS matrices are downsampled to 5 5. The results indicate that, compared to MusicGen, our method produces The new wave of generative models has been explored in the samples that more closely resemble the Pond5 samples literature to generate music. Jukebox [1] is based on Hierarchical in terms of long-term temporal consistency and the diversity VQ-VAEs [2] to generate multiple minutes of music. of recurring sections. Jukebox is one of the earliest purely learning-based models that could generate longer than one minute of music with some degree of structural coherence. Notably, the authors mention that the generated music at a small scale of multiple learn musical structures and forms at all scales. However, seconds is coherent, and at a larger scale, beyond one minute, none of the models in the literature has demonstrated musical it lacks musical form.
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Yuan, Ruibin, Lin, Hanfeng, Wang, Yi, Tian, Zeyue, Wu, Shangda, Shen, Tianhao, Zhang, Ge, Wu, Yuhang, Liu, Cong, Zhou, Ziya, Ma, Ziyang, Xue, Liumeng, Wang, Ziyu, Liu, Qin, Zheng, Tianyu, Li, Yizhi, Ma, Yinghao, Liang, Yiming, Chi, Xiaowei, Liu, Ruibo, Wang, Zili, Li, Pengfei, Wu, Jingcheng, Lin, Chenghua, Liu, Qifeng, Jiang, Tao, Huang, Wenhao, Chen, Wenhu, Benetos, Emmanouil, Fu, Jie, Xia, Gus, Dannenberg, Roger, Xue, Wei, Kang, Shiyin, Guo, Yike
While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > China > Hong Kong (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
In-depth analysis of music structure as a text network
Tsai, Ping-Rui, Chou, Yen-Ting, Wang, Nathan-Christopher, Chen, Hui-Ling, Huang, Hong-Yue, Luo, Zih-Jia, Hong, Tzay-Ming
Music, enchanting and poetic, permeates every corner of human civilization. Although music is not unfamiliar to people, our understanding of its essence remains limited, and there is still no universally accepted scientific description. This is primarily due to music being regarded as a product of both reason and emotion, making it difficult to define. In this article, we focus on the fundamental elements of music and construct an evolutionary network from the perspective of music as a natural language, aligning with the statistical characteristics of texts. Through this approach, we aim to comprehend the structural differences in music across different periods, enabling a more scientific exploration of music. Relying on the advantages of structuralism, we can concentrate on the relationships and order between the physical elements of music, rather than getting entangled in the blurred boundaries of science and philosophy. The scientific framework we present not only conforms to past conclusions in music, but also serves as a bridge that connects music to natural language processing and knowledge graphs.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (4 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine (0.93)
Musical Form Generation
While recent generative models can produce engaging music, their utility is limited. The variation in the music is often left to chance, resulting in compositions that lack structure. Pieces extending beyond a minute can become incoherent or repetitive. This paper introduces an approach for generating structured, arbitrarily long musical pieces. Central to this approach is the creation of musical segments using a conditional generative model, with transitions between these segments. The generation of prompts that determine the high-level composition is distinct from the creation of finer, lower-level details. A large language model is then used to suggest the musical form.
- Media > Music (1.00)
- Leisure & Entertainment (0.97)
MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks
Lu, Peiling, Tan, Xu, Yu, Botao, Qin, Tao, Zhao, Sheng, Liu, Tie-Yan
Human usually composes music by organizing elements according to the musical form to express music ideas. However, for neural network-based music generation, it is difficult to do so due to the lack of labelled data on musical form. In this paper, we develop MeloForm, a system that generates melody with musical form using expert systems and neural networks. Specifically, 1) we design an expert system to generate a melody by developing musical elements from motifs to phrases then to sections with repetitions and variations according to pre-given musical form; 2) considering the generated melody is lack of musical richness, we design a Transformer based refinement model to improve the melody without changing its musical form. MeloForm enjoys the advantages of precise musical form control by expert systems and musical richness learning via neural models. Both subjective and objective experimental evaluations demonstrate that MeloForm generates melodies with precise musical form control with 97.79% accuracy, and outperforms baseline systems in terms of subjective evaluation score by 0.75, 0.50, 0.86 and 0.89 in structure, thematic, richness and overall quality, without any labelled musical form data. Besides, MeloForm can support various kinds of forms, such as verse and chorus form, rondo form, variational form, sonata form, etc.
How an AI finished Beethoven's last symphony and what that means for the future of music
When he died in 1827 aged 56, Ludwig van Beethoven left his 10th symphony unfinished. Only a few handwritten notes briefly detailing his plans for the piece have survived, with most just being incomplete ideas or fragments of themes or melodies. Now, a multidisciplinary team of computer scientists at Rutgers University-based start-up Playform AI have trained an artificial intelligence to mimic the great composer's style and used it to write a complete symphony based on these initial sketches. We spoke to the lead researcher on the project, Professor Ahmed Elgammal, to find out more. Beethoven left sketches in different forms, mainly musical sketches, but also some written notes with some ideas in as well.
- Media > Music (0.50)
- Leisure & Entertainment (0.50)